Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.
Data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md
In Star Trek, characters may interact with the holodeck, music, or the replicator. In this graphic, we visualize how often the main characters interacted with the computer over each season.
The holodeck is a 3D simulation room, “music” is when a character requests the computer to play music, and “replicator” is a device the character can use to create food or drink (similar to a 3D food printer).
We can see from the heat map on the left that the holodeck was the subdomain that was interacted with the most, across all seasons and characters. Geordi LaForge interacted with the holodeck the most, primarily in seasons three and four. Picard had the most consistent use of the holodeck over all seasons. It is unique that Geordi had the most interactions with entertainment subdomains across all seasons, because he is not always a main character in the show; however, he often interacts with the holodeck system on behalf of other people (to set holodeck programs to challenge other characters) or for work-related purposes (experimenting with engineering via simulation scenarios).
When we interact with voice-command technology, we use certain types of interactions to ‘wake’ the system (“Hey Siri…”), ‘command’ the system (“Play a song on Spotify”), ‘question’ the system (“What is the temperature for today?”), and many other types of interactions.
These interaction types can exist in a chain, such as “Hey Siri, Play a song on Spotify”. However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify.
On the Starship Enterprise, the crew interacts with the Computer through different primary ‘Interaction Types’. Definitions and examples of these interaction types can be found below.
| Interaction Type | Definition | Examples |
|---|---|---|
| Command | Utterances that directly tell the computer what to do. | Run a diagnostic on the port nacelle. |
| Question | Utterances that ask the computer for something. | Where is Captain Picard? |
| Statement | Utterances tell don’t tell the computer or ask it, but meaning is inferred. | Deck four. I wish to learn about Earth. |
| Password | Utterances that contain a password. | This is Captain Picard. |
| Wake Word | Key phrases used to activate the computer. | Computer. Holodeck. |
| Comment | Utterances that have no intended action for the computer. | Excellent. Ferrazene has a complex molecular structure. |
| Conversation | Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. | Well, check it again! Then run it for us, dear. |
Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only).
The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer.
Via the data visualizations created from proportion tests, we can see that Wake Word, Question, Conversation, and Password interactions are most likely to result in a Verbal response from the Computer, and Statement, Command, Comment interactions were found to result in either Verbal or Non-Verbal Computer response fairly equally. One limitation of this analysis is that sample size for certain combinations of interactions and responses are low.
Data source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf
Using the lexicon bing, we performed a sentiment analysis to examine the count of positive and negative connotation of words spoken to the computer (via five main characters) and by the computer to all characters in the show.
The computer tends to use more negative words. Riker tends to use more positive words. All other main characters analyzed showed equal use of positive or negative words.
Using the lexicon NRC, we performed a sentiment analysis to examine the emotional connotation of words spoken to the computer (via five main characters) and by the computer to all characters in the show. The computer tended to express trust and fear most. All other main five characters analyzed expressed trust most often. Beverly was more varied in her emotion-based words, but did express trust most often.
Using the lexicon affin, we performed a sentiment analysis to examine the gradient of emotions (positive: 4 to negative: -4) of words spoken to the computer (via five main characters) and by the computer to all characters in the show.
The computer and Geordi had more emotional range across the gradient from positive to negative. Data and Riker used words that were on the positive side of the gradient most often. Picard and Beverly used words that were on the negative side of the gradient most often.
When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.
This image was created using the spoken lines from all of the characters (except the computer) and each word was individually counted. Interestingly, “program” appears to be the most common word with 154 uses, however, the most used word was “computer” with 1025 uses. Wouldn’t be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.
This second word cloud helps us to visually compare the character word choice to the computer word choice. We can see that the computer has limited variability in its word use by the limited size of the cloud. In comparison the characters used 1521 different words compared to the computers 287.
This is similar to our use in computer AI technology with Alexa, Cortana, and Siri devises. Though computers can be smart, they only know what we provide them, this typically being simple commands and responses.
Interestingly, with another quick comparison using an inner join, they only share 169 words. This left the computer with 118 unique words and the characters with 1352 unique words. Maybe the computers are more intelligent then we give them credit for…
---
title: "Tea, Earl Grey, Hot: Designing Speech Interactions from the Imagined Ideal of Star Trek"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
theme: spacelab
---
```{r setup, include=FALSE}
library(flexdashboard)
library(readr)
library(knitr)
library(tidyverse)
library(purrr)
library(broom)
library(plotly)
library(wordcloud)
library(RColorBrewer)
library(tm)
library(wordcloud2)
library(dplyr)
library(ggplot2)
library(forcats)
library(tidytext)
library(textdata)
library(viridis)
startrek <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-17/computer.csv')
```
### Data Description
```{r}
include_graphics('https://raw.githubusercontent.com/LaurS12/ERHS535_Group_Project/main/Images/data_description.png')
#note: this looks like garbage in the markdown file, but if you knit, it shows up correct.
```
***
Speech is now common in daily interactions with our devices, thanks to voice user interfaces (VUIs) like Alexa. Despite their seeming ubiquity, designs often do not match users’ expectations. Science fiction, which is known to influence design of new technologies, has included VUIs for decades. Star Trek: The Next Generation is a prime example of how people envisioned ideal VUIs. Understanding how current VUIs live up to Star Trek’s utopian technologies reveals mismatches between current designs and user expectations, as informed by popular fiction. Combining conversational analysis and VUI user analysis, we study voice interactions with the Enterprise’s computer and compare them to current interactions. Independent of futuristic computing power, we find key design-based differences: Star Trek interactions are brief and functional, not conversational, they are highly multimodal and context-driven, and there is often no spoken computer response. From this, we suggest paths to better align VUIs with user expectations.
Data source: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-17/readme.md
### Character Voice Interactions with Entertainment Computers
``` {r season_column, results='hide', include=FALSE}
# 1: 26, 2: 48, 3: 74, 4: 100, 5: 126, 6: 152, 7: 178
favthings <- startrek %>%
mutate(season = case_when(name < 127 ~ '1',
name < 149 ~ '2',
name < 175 ~ '3',
name < 201 ~ '4',
name < 227 ~ '5',
name < 253 ~ '6',
name < 279 ~ '7'))
#created the season column
```
``` {r select_domains, results='hide', include=FALSE}
table(favthings$char)
favthings <- favthings %>%
filter(sub_domain %in% c("Holodeck", "Replicator", "Music")) %>%
filter(char %in% c("Beverly", "Data", "Geordi", "Picard", "Riker", "Troi", "Worf"))
# filtering to specific subdomains and main characters
```
``` {r frequencyvector, results='hide', include=FALSE}
freqcom <- favthings %>%
group_by(char, sub_domain, season) %>%
count() %>%
ungroup()%>%
mutate(char = fct_reorder(char, n, .fun=sum))
summarise(freqcom)
#grouping and reordering results so it shows up pretty on the heatmap
```
``` {r chartidea}
freqmap <- ggplot(freqcom, aes(season, char)) +
geom_tile(aes(fill = n)) +
facet_wrap(~ sub_domain) +
scale_fill_viridis_c()+
labs(x = "Season", y = "",
fill = "Frequency",
title = "Entertainment Subdomain Voice Interaction by Season") +
theme_classic()+
theme(text = element_text(family = "Optima"),
legend.position = "bottom")
freqmap
#removed ncol =1
# frequency heat map
```
***
In Star Trek, characters may interact with the holodeck, music, or the replicator. In this graphic, we visualize how often the main characters interacted with the computer over each season.
The holodeck is a 3D simulation room, "music" is when a character requests the computer to play music, and "replicator" is a device the character can use to create food or drink (similar to a 3D food printer).
We can see from the heat map on the left that the holodeck was the subdomain that was interacted with the most, across all seasons and characters. Geordi LaForge interacted with the holodeck the most, primarily in seasons three and four. Picard had the most consistent use of the holodeck over all seasons. It is unique that Geordi had the most interactions with entertainment subdomains across all seasons, because he is not always a main character in the show; however, he often interacts with the holodeck system on behalf of other people (to set holodeck programs to challenge other characters) or for work-related purposes (experimenting with engineering via simulation scenarios).
### Verbal vs. Non-Verbal Computer Responses per Primary Types of Voice Interactions
```{r, results='hide'}
no_comp_voice <- startrek %>%
filter(char != "Computer Voice") %>%
filter(char != "Computer") %>%
filter(char != "Computer (V.O.)") %>%
filter(char != "Computer (V.O)") %>%
filter(char != "Computer Voice (V.O.)") %>%
filter(char != "New Computer Voice") %>%
filter(char != "Com Panel (V.O.)") %>%
filter(char != "Computer'S Voice") %>%
filter(char != "Computer (Voice)") %>%
filter(char != "Computer Voice (Cont'D)")
no_comp_voice <- no_comp_voice %>%
select('pri_type', 'nv_resp')
no_comp_voice$nv_resp <- as.factor(no_comp_voice$nv_resp)
no_comp_voice$pri_type <- as.factor(no_comp_voice$pri_type)
no_comp_voice$nv_resp <- no_comp_voice$nv_resp %>%
recode_factor("TRUE" = "Non-Verbal Response") %>%
recode_factor("FALSE" = "Verbal Response")
levels(no_comp_voice$pri_type)
no_comp_voice <- no_comp_voice %>%
group_by(pri_type, nv_resp) %>%
tally()
no_comp_voice <- no_comp_voice %>%
pivot_wider(names_from = nv_resp, values_from = n)
no_comp_voice[is.na(no_comp_voice)] = 0
no_comp_voice <- no_comp_voice %>%
rename(n_verbal = "Verbal Response") %>%
rename(n_non_verbal = "Non-Verbal Response")
no_comp_voice$total_resp <- no_comp_voice$n_verbal + no_comp_voice$n_non_verbal
no_comp_voice
prop_verbal <- no_comp_voice %>%
mutate(prop_test = purrr::map2(.x= n_verbal,
.y= total_resp,
.f= prop.test))
prop_verbal <- prop_verbal %>%
mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))
prop_non_verbal <- no_comp_voice %>%
mutate(prop_test = purrr::map2(.x= n_non_verbal,
.y= total_resp,
.f= prop.test))
prop_non_verbal <- prop_non_verbal %>%
mutate(prop_tidy = purrr::map(prop_test, ~tidy(.x)))
prop_verbal <- prop_verbal%>%
unnest(prop_tidy)
prop_non_verbal <- prop_non_verbal%>%
unnest(prop_tidy)
prop_verbal <- prop_verbal %>%
select(-prop_test)
prop_non_verbal <- prop_non_verbal %>%
select(-prop_test)
prop_verbal <- prop_verbal %>%
select(pri_type, estimate, conf.low, conf.high, n_verbal)
prop_non_verbal <- prop_non_verbal %>%
select(pri_type, estimate, conf.low, conf.high, n_non_verbal)
prop_verbal <- prop_verbal %>%
mutate(estimate = as.numeric(estimate),
conf.low = as.numeric(conf.low),
conf.high = as.numeric(conf.high))
prop_non_verbal <- prop_non_verbal %>%
mutate(estimate = as.numeric(estimate),
conf.low = as.numeric(conf.low),
conf.high = as.numeric(conf.high))
prop_verbal <- prop_verbal %>%
arrange(desc(estimate))
prop_non_verbal <- prop_non_verbal %>%
arrange(desc(estimate))
prop_verbal$resp <- "Verbal"
prop_non_verbal$resp <- "Non-Verbal"
resp_per_int <- rbind(prop_verbal, prop_non_verbal)
resp_per_int[is.na(resp_per_int)] = 0
resp_per_int$n <- resp_per_int$n_non_verbal + resp_per_int$n_verbal
resp_per_int <- resp_per_int %>%
select(-n_non_verbal) %>%
select(-n_verbal)
resp_per_int$resp <- as.factor(resp_per_int$resp)
resp_per_int$pri_type <- factor(resp_per_int$pri_type, levels = c("Password", "Conversation", "Question", "Wake Word", "Comment", "Command", "Statement"))
```
```{r, include=FALSE}
chart_3 <- resp_per_int %>%
ungroup() %>%
ggplot(aes(label=conf.low,
label2=conf.high,
label3=n))+
geom_col(aes(x=estimate, y=pri_type, fill=resp), position="fill")+
labs(title= "Proportions of Computer Response Type",
y= "Person Interaction Type",
x= "Percent of Responses",
subtitle = "Bars show 95% confidence interval",
fill = "")+
scale_x_continuous(labels = scales::percent)+
scale_fill_brewer(palette = "Paired")
theme(plot.title = element_text(hjust = -0.45, vjust=2.12))+
theme_bw()
```
```{r}
ggplotly(chart_3, height=500, width=900)
```
***
When we interact with voice-command technology, we use certain types of interactions to 'wake' the system ("Hey Siri..."), 'command' the system ("Play a song on Spotify"), 'question' the system ("What is the temperature for today?"), and many other types of interactions.
These interaction types can exist in a chain, such as "Hey Siri, Play a song on Spotify". However, the primary type of interaction in this phrase is the command to have Siri play a song on Spotify.
On the Starship Enterprise, the crew interacts with the Computer through different primary 'Interaction Types'. Definitions and examples of these interaction types can be found below.
| Interaction Type | Definition | Examples |
|------------------|-------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| Command | Utterances that directly tell the computer what to do. | Run a diagnostic on the port nacelle. |
| Question | Utterances that ask the computer for something. | Where is Captain Picard? |
| Statement | Utterances tell don't tell the computer or ask it, but meaning is inferred. | Deck four. I wish to learn about Earth. |
| Password | Utterances that contain a password. | This is Captain Picard. |
| Wake Word | Key phrases used to activate the computer. | Computer. Holodeck. |
| Comment | Utterances that have no intended action for the computer. | Excellent. Ferrazene has a complex molecular structure. |
| Conversation | Utterances that are more like human conversation, such as phatic espressions, formalities, and colloquial speech. | Well, check it again! Then run it for us, dear. |
Because the Computer on the Starship Enterprise can generate objects and display information without responding, it is of interest to examine the proportion of occurrences when the computer responds verbally or non-verbally (which includes through actions only).
The visualizations to the left shows the proportion of verbal versus non-verbal responses, according to interaction type by person. This information can help us understand what types of interactions are more likely to result in verbal or non-verbal responses from the Starship Enterprise Computer.
Via the data visualizations created from proportion tests, we can see that Wake Word, Question, Conversation, and Password interactions are most likely to result in a Verbal response from the Computer, and Statement, Command, Comment interactions were found to result in either Verbal or Non-Verbal Computer response fairly equally. One limitation of this analysis is that sample size for certain combinations of interactions and responses are low.
Data source: http://www.speechinteraction.org/TNG/TeaEarlGreyHotDatasetCodeBook.pdf
### Positive and Negative Words in Voice Interactions
```{r, results='hide', include=FALSE}
#tidy the data
tibble(startrek)
startrek_tidy2 <- startrek %>%
select(char, line) %>%
unnest_tokens(words, line) %>% #makes each word an observation
rename("script_words" = words) %>%
group_by(char) %>%
mutate(char = recode(char, 'Geordi (V.O.)' = "Geordi",'Geordi (O.S.)' = "Geordi",
'Computer (V.O.)' = "Computer", 'Computer Voice' = "Computer",
'Computer Voice (V.O.)' = "Computer",
'New Computer Voice' = "Computer",
'Riker (O.S.)' = "Riker",
'Picard (O.S.)' = "Picard", 'Picard (V.O.)' = "Picard", 'Young Picard' = "Picard",
'Jean-Luc' = "Picard"))
select_char <- c("Picard", "Geordi", "Data", "Riker", "Computer", "Beverly")
startrek_tidy3 <- filter(startrek_tidy2, char %in% select_char)
######Sentiment Analysis######
##bing
bing_join <- get_sentiments("bing") %>%
inner_join(startrek_tidy3, by = c("word" = "script_words")) %>%
group_by(sentiment, char) %>%
count()
```
```{r}
plot_bing <- bing_join %>%
ggplot(aes(char, sentiment, fill = sentiment, y = n)) +
geom_bar(position = "stack", stat = "identity") +
xlab("Star Trek Character") + ylab("Number of words") +
labs(title = "TNG Character and Computer Interactions",
subtitle = "Word Sentiment Analysis of the 'Bing' Lexicon",
caption = "Data source: www.tidytuesday.com") +
theme(
plot.title = element_text(size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
plot.caption = element_text(size = 9, hjust = 0.5),
legend.title = element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
axis.text.x = element_text(angle = 60, vjust = 0.5, hjust = 0.5),
) +
scale_fill_brewer(palette = "Paired")
print(plot_bing)
```
***
Using the lexicon bing, we performed a sentiment analysis to examine the count of positive and negative connotation of words spoken to the computer (via five main characters) and by the computer to all characters in the show.
The computer tends to use more negative words. Riker tends to use more positive words. All other main characters analyzed showed equal use of positive or negative words.
### Emotions Expressed by Words in Voice Interactions
```{r, results='hide', include=FALSE}
###nrc (feelings)
omit_sent <- c("positive", "negative") #take out data with pos/neg because already analyzed
sent_nrc <- get_sentiments("nrc") %>%
filter(!sentiment %in% omit_sent)
nrc_join <- sent_nrc %>%
inner_join(startrek_tidy3, by = c("word" = "script_words")) %>%
group_by(sentiment, char) %>%
count()
```
```{r}
plot_nrc <- nrc_join %>%
ggplot(aes(char, sentiment, fill = sentiment, y = n)) +
geom_bar(position = "stack", stat = "identity") +
xlab("Star Trek Character") + ylab("Number of words") +
labs(title = "TNG Character and Computer Interactions",
subtitle = "Word Sentiment Analysis of the 'NRC' Lexicon",
caption = "Data source: www.tidytuesday.com") +
theme(
plot.title = element_text(size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
plot.caption = element_text(size = 8, hjust = 0.5),
legend.title = element_blank(),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
axis.text.x = element_text(angle = 60, vjust = 0.5, hjust = 0.5),
)
print(plot_nrc)
```
***
Using the lexicon NRC, we performed a sentiment analysis to examine the emotional connotation of words spoken to the computer (via five main characters) and by the computer to all characters in the show.
The computer tended to express trust and fear most. All other main five characters analyzed expressed trust most often. Beverly was more varied in her emotion-based words, but did express trust most often.
### Examination of a Positive to Negative Gradient of Word Connotation
```{r, results='hide', include=FALSE}
###afinn (score)
sent_afinn <- get_sentiments("afinn") #score
afinn_join <- sent_afinn %>%
inner_join(startrek_tidy3, by = c("word" = "script_words")) %>%
group_by(value, char) %>%
count()
x_labels = c(-4:4)
```
```{r}
plot_afinn <- afinn_join %>%
ggplot(aes(value, n, fill = value)) +
geom_bar(position = "dodge", stat = "identity") +
scale_fill_viridis(option = "D") +
scale_color_viridis(option = "D") +
facet_wrap(vars(char)) +
xlab("Sentiment Score") +
ylab("Word Count") +
labs(title = "TNG Character and Computer Interactions",
subtitle = "Word Sentiment Analysis of the 'Afinn' Lexicon",
caption = "Data source: www.tidytuesday.com",
color = "Word Count") +
theme(
plot.title = element_text(size = 14, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5),
plot.caption = element_text(size = 9, hjust = 0.5),
panel.grid.minor.x=element_blank(),
panel.grid.major.x=element_blank(),
panel.grid.minor.y=element_blank(),
panel.grid.major.y=element_blank()) +
theme(legend.position = "bottom", legend.title = element_blank()) +
scale_x_continuous(labels = x_labels, breaks = x_labels)
print(plot_afinn)
```
***
Using the lexicon affin, we performed a sentiment analysis to examine the gradient of emotions (positive: 4 to negative: -4) of words spoken to the computer (via five main characters) and by the computer to all characters in the show.
The computer and Geordi had more emotional range across the gradient from positive to negative. Data and Riker used words that were on the positive side of the gradient most often. Picard and Beverly used words that were on the negative side of the gradient most often.
```{r, results='hide', include=FALSE}
# Courtney data cleaning
comp_voice <- startrek %>%
filter(char == c("Computer Voice", "Computer", "Computer (V.O.)",
"Computer (V.O)", "Computer Voice (V.O.)", "New Computer Voice",
"Com Panel (V.O.)", "Computer'S Voice", "Computer (Voice)",
"Computer Voice (Cont'D)"))
person_voice <- startrek %>%
filter(char != "Computer Voice") %>%
filter(char != "Computer") %>%
filter(char != "Computer (V.O.)") %>%
filter(char != "Computer (V.O)") %>%
filter(char != "Computer Voice (V.O.)") %>%
filter(char != "New Computer Voice") %>%
filter(char != "Com Panel (V.O.)") %>%
filter(char != "Computer'S Voice") %>%
filter(char != "Computer (Voice)") %>%
filter(char != "Computer Voice (Cont'D)")
```
### How common is each word used by the characters?
```{r}
# Person lines only
# Filter to necessary column
textperson <- person_voice$interaction
# Clean text
docsperson <- Corpus(VectorSource(textperson))
docsperson <- docsperson %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
docsperson <- tm_map(docsperson, content_transformer(tolower))
docsperson <- tm_map(docsperson, removeWords, stopwords("english"))
# Create matrix with counts
dtmperson <- TermDocumentMatrix(docsperson)
matrixperson <- as.matrix(dtmperson)
wordsperson <- sort(rowSums(matrixperson),decreasing = TRUE)
dfperson <- data.frame(word = names(wordsperson), freq = wordsperson)
# Person Wordcloud
wordcloud2(data = dfperson, size = 2.5, color= "random-light", shape = "circle", backgroundColor = "black")
```
***
When looking at text, something that may come up is how common our choice of words can be. A great way to visualize this idea is with word clouds! A bundle of words with varying size, related to how often that word was used.
This image was created using the spoken lines from all of the characters (except the computer) and each word was individually counted. Interestingly, "program" appears to be the most common word with 154 uses, however, the most used word was "computer" with 1025 uses. Wouldn't be much of a word cloud when a single word is the cloud. By removing the extreme outlier we were able to make a beautiful image that visualizes the Star Trek speech.
### How common is each word used by the computer?
```{r}
# Computer lines only
# Filter to necessary column
textcomp <- comp_voice$interaction
# Clean text
docscomp <- Corpus(VectorSource(textcomp))
docscomp <- docscomp %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(stripWhitespace)
docscomp <- tm_map(docscomp, content_transformer(tolower))
docscomp <- tm_map(docscomp, removeWords, stopwords("english"))
# Create matrix with counts
dtmcomp <- TermDocumentMatrix(docscomp)
matrixcomp <- as.matrix(dtmcomp)
wordscomp <- sort(rowSums(matrixcomp),decreasing = TRUE)
dfcomp <- data.frame(word = names(wordscomp), freq = wordscomp)
# Computer Wordcloud
compcloud<-wordcloud2(data = dfcomp, size = 0.5, color= "random-light", shape= "circle", backgroundColor = "black")
library(htmlwidgets)
webshot::install_phantomjs()
saveWidget(compcloud,"1.html",selfcontained = F)
webshot::webshot("1.html","1.png",vwidth = 700, vheight = 500, delay =10)
```
***
This second word cloud helps us to visually compare the character word choice to the computer word choice. We can see that the computer has limited variability in its word use by the limited size of the cloud. In comparison the characters used 1521 different words compared to the computers 287.
This is similar to our use in computer AI technology with Alexa, Cortana, and Siri devises. Though computers can be smart, they only know what we provide them, this typically being simple commands and responses.
Interestingly, with another quick comparison using an inner join, they only share 169 words. This left the computer with 118 unique words and the characters with 1352 unique words. Maybe the computers are more intelligent then we give them credit for...